Search results for "Information Storage and Retrieval"

showing 10 items of 48 documents

Reactome graph database: Efficient access to complex pathway data

2018

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its qu…

0301 basic medicineDatabases FactualComputer scienceData managementKnowledge BasesSocial SciencesInformation Storage and RetrievalNoSQLcomputer.software_genreComputer ApplicationsDatabase and Informatics MethodsUser-Computer Interface0302 clinical medicineKnowledge extractionPsychologyDatabase Searchinglcsh:QH301-705.5Data ManagementLanguageBiological dataEcologySystems BiologyGenomicsGenomic DatabasesComputational Theory and MathematicsModeling and SimulationWeb-Based ApplicationsGraph (abstract data type)Information TechnologyResearch ArticleComputer and Information SciencesRelational databaseQuery languageResearch and Analysis MethodsEcosystems03 medical and health sciencesCellular and Molecular NeuroscienceDatabasesGeneticsComputer GraphicsHumansMolecular BiologyEcology Evolution Behavior and SystematicsInternetInformation retrievalGraph databasebusiness.industryEcology and Environmental SciencesCognitive PsychologyBiology and Life SciencesComputational BiologyGenome AnalysisRelational Databases030104 developmental biologyBiological Databaseslcsh:Biology (General)Cognitive Sciencebusinesscomputer030217 neurology & neurosurgerySoftwareNeurosciencePLoS Computational Biology

researchProduct

FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications

2017

Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…

0301 basic medicineFASTQ formatStatistics and ProbabilityComputer scienceSequence analysismedia_common.quotation_subjectInformation Storage and RetrievalBioinformaticscomputer.software_genreGenomeBiochemistryDomain (software engineering)03 medical and health sciencesComputational Theory and MathematicHumansGenomic libraryQuality (business)DNA sequencingFASTQ; NGS; FASTQ; DNA sequencingMolecular Biologymedia_commonGene LibrarySequenceDatabaseSettore INF/01 - InformaticaGenome HumanComputer Science Applications1707 Computer Vision and Pattern RecognitionGenomicsSequence Analysis DNAFASTQFile formatComputer Science ApplicationsStatistics and Probability; Biochemistry; Molecular Biology; Computer Science Applications1707 Computer Vision and Pattern Recognition; Computational Theory and Mathematics; Computational MathematicsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsNGSDatabase Management Systemscomputer

researchProduct

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research

2015

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate glo…

0301 basic medicineNetherlands Twin Register (NTR)Databases FactualComputer scienceInformation Storage and RetrievalSample (statistics)Ontology (information science)Endocrinology and DiabetesBioinformaticscomputer.software_genredata archivesArticle03 medical and health sciencesSDG 17 - Partnerships for the GoalsSDG 3 - Good Health and Well-beingGenetics/dk/atira/pure/keywords/cohort_studies/netherlands_twin_register_ntr_Use casebiomedical dataGenetics (clinical)Biological Specimen BanksGenetics & Heredity0604 GeneticsBioinformatics (Computational Biology)ta112ta1184/dk/atira/pure/sustainabledevelopmentgoals/partnershipsData scienceBiobank3. Good healthcross-biotank research030104 developmental biologyProject planningExchange of informationDisparate systemPrivacyBioinformatik (beräkningsbiologi)/dk/atira/pure/sustainabledevelopmentgoals/good_health_and_well_beingclinical datacomputerData integrationEuropean Journal of Human Genetics

researchProduct

Active learning strategies for the deduplication of electronic patient data using classification trees.

2012

Graphical abstractDisplay Omitted Highlights? Active learning for medical record linkage is used on a large data set. ? We compare a simple active learning strategy with a more sophisticated variant. ? The active learning method of Sarawagi and Bhamidipaty (2002) 6] is extended. ? We deliver insights into the variations of the results due to random sampling in the active learning strategies. IntroductionSupervised record linkage methods often require a clerical review to gain informative training data. Active learning means to actively prompt the user to label data with special characteristics in order to minimise the review costs. We conducted an empirical evaluation to investigate whether…

Active learningComputer scienceActive learning (machine learning)Information Storage and RetrievalContext (language use)Health InformaticsSemi-supervised learningMachine learningcomputer.software_genreSet (abstract data type)Artificial IntelligenceBaggingData deduplicationElectronic Health RecordsHumansbusiness.industryString (computer science)Decision TreesOnline machine learningComputer Science ApplicationsData miningArtificial intelligenceMedical Record LinkageString metricbusinesscomputerAlgorithmsJournal of biomedical informatics

researchProduct

BlotBase: a northern blot database.

2008

With the availability of high-throughput gene expression analysis, multiple public expression databases emerged, mostly based on microarray expression data. Although these databases are of significant biomedical value, they do hold significant drawbacks, especially concerning the reliability of single gene expression profiles obtained by microarray data. Simultaneously, reliable data on an individual gene's expression are often published as single northern blots in individual publications. These data were not yet available for high-throughput screening. To reduce the gap between high-throughput expression data and individual highly reliable expression data, we designed a novel database "Blo…

Bar chartHUGO Gene Nomenclature CommitteeValue (computer science)Information Storage and RetrievalBiologycomputer.software_genrePolymerase Chain Reactionlaw.inventionMicelawGeneticsComputer GraphicsMicroarray databasesAnimalsHumansNorthern blotDatabases ProteinDNA PrimersInternetDatabaseMicroarray analysis techniquesSequence Analysis RNAGene Expression ProfilingFull text searchComputational BiologyGeneral MedicineBlotting NorthernGene expression profilingDatabase Management SystemscomputerSoftwareGene

researchProduct

Automatic detection of large dense-core vesicles in secretory cells and statistical analysis of their intracellular distribution.

2010

Analyzing the morphological appearance and the spatial distribution of large dense-core vesicles (granules) in the cell cytoplasm is central to the understanding of regulated exocytosis. This paper is concerned with the automatic detection of granules and the statistical analysis of their spatial locations in different cell groups. We model the locations of granules of a given cell as a realization of a finite spatial point process and the point patterns associated with the cell groups as replicated point patterns of different spatial point processes. First, an algorithm to segment the granules using electron microscopy images is proposed. Second, the relative locations of the granules with…

Chromaffin CellsInformation Storage and RetrievalBiologyBioinformaticsModels BiologicalSensitivity and SpecificityPoint processExocytosislaw.inventionPattern Recognition AutomatedMicelawArtificial IntelligenceImage Interpretation Computer-AssistedGeneticsAnimalsSecretionChromaffin GranulesComputer SimulationCells CulturedModels StatisticalApplied MathematicsVesicleSecretory VesiclesReproducibility of ResultsImage EnhancementEmpirical distribution functionMicroscopy ElectronAnimals NewbornCytoplasmData Interpretation StatisticalElectron microscopeBiological systemIntracellularAlgorithmsBiotechnologyIEEE/ACM transactions on computational biology and bioinformatics

researchProduct

Upport vector machines for nonlinear kernel ARMA system identification.

2006

Nonlinear system identification based on support vector machines (SVM) has been usually addressed by means of the standard SVM regression (SVR), which can be seen as an implicit nonlinear autoregressive and moving average (ARMA) model in some reproducing kernel Hilbert space (RKHS). The proposal of this letter is twofold. First, the explicit consideration of an ARMA model in an RKHS (SVM-ARMA 2k) is proposed. We show that stating the ARMA equations in an RKHS leads to solving the regularized normal equations in that RKHS, in terms of the autocorrelation and cross correlation of the (nonlinearly) transformed input and output discrete time processes. Second, a general class of SVM-based syste…

Computer Science::Machine LearningStatistics::TheoryComputer Networks and CommunicationsBiomedical signal processingInformation Storage and RetrievalMachine learningcomputer.software_genrePattern Recognition AutomatedStatistics::Machine LearningArtificial IntelligenceApplied mathematicsStatistics::MethodologyAutoregressive–moving-average modelComputer SimulationMathematicsTelecomunicacionesHardware_MEMORYSTRUCTURESSupport vector machinesModels StatisticalNonlinear system identificationbusiness.industryAutocorrelationSystem identificationSignal Processing Computer-AssistedGeneral MedicineComputer Science ApplicationsSupport vector machineNonlinear systemKernelAutoregressive modelNonlinear DynamicsARMA modelling3325 Tecnología de las TelecomunicacionesArtificial intelligenceNeural Networks ComputerbusinesscomputerSoftwareAlgorithmsReproducing kernel Hilbert spaceIEEE transactions on neural networks

researchProduct

Kernel manifold alignment for domain adaptation

2016

The wealth of sensory data coming from different modalities has opened numerous opportu- nities for data analysis. The data are of increasing volume, complexity and dimensionality, thus calling for new methodological innovations towards multimodal data processing. How- ever, multimodal architectures must rely on models able to adapt to changes in the data dis- tribution. Differences in the density functions can be due to changes in acquisition conditions (pose, illumination), sensors characteristics (number of channels, resolution) or different views (e.g. street level vs. aerial views of a same building). We call these different acquisition modes domains, and refer to the adaptation proble…

Computer and Information SciencesKernel FunctionsInformation Storage and RetrievalSocial Scienceslcsh:Medicine1100 General Agricultural and Biological SciencesResearch and Analysis MethodsInfographicsTopologyPattern Recognition AutomatedKernel MethodsCognitionLearning and MemoryMemory1300 General Biochemistry Genetics and Molecular BiologyImage Interpretation Computer-AssistedData MiningHumansPsychologyLife Science910 Geography & travelOperator TheoryManifoldslcsh:ScienceObject Recognition1000 MultidisciplinaryApplied MathematicsSimulation and ModelingData Visualizationlcsh:RCognitive PsychologyBiology and Life SciencesEigenvaluesFacial ExpressionAlgebra10122 Institute of GeographyLinear AlgebraData Interpretation StatisticalPhysical SciencesCognitive SciencePerceptionlcsh:QEigenvectorsGraphsAlgorithmsMathematicsResearch ArticleNeuroscience

researchProduct

Characterization of entropy measures against data loss: Application to EEG records

2012

This study is aimed at characterizing three signal entropy measures, Approximate Entropy (ApEn), Sample Entropy (SampEn) and Multiscale Entropy (MSE) over real EEG signals when a number of samples are randomly lost due to, for example, wireless data transmission. The experimental EEG database comprises two main signal groups: control EEGs and epileptic EEGs. Results show that both SampEn and ApEn enable a clear distinction between control and epileptic signals, but SampEn shows a more robust performance over a wide range of sample loss ratios. MSE exhibits a poor behavior for ratios over a 40% of sample loss. The EEG non-stationary and random trends are kept even when a great number of samp…

Computer scienceEntropyInformation Storage and RetrievalData lossElectroencephalographySensitivity and SpecificityApproximate entropyMultiscale entropyEntropy (classical thermodynamics)SeizuresStatisticsmedicineHumansEntropy (information theory)Entropy (energy dispersal)Entropy (arrow of time)medicine.diagnostic_testbusiness.industryEntropy (statistical thermodynamics)Reproducibility of ResultsElectroencephalographyPattern recognitionSample entropyArtificial intelligenceArtifactsbusinessAlgorithmsEntropy (order and disorder)2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society

researchProduct

CoCoDat: a database system for organizing and selecting quantitative data on single neurons and neuronal microcircuitry.

2004

We present a novel database system for organizing and selecting quantitative experimental data on single neurons and neuronal microcircuitry that has proven useful for reference-keeping, experimental planning and computational modelling. Building on our previous experience with large neuroscientific databases, the system takes into account the diversity and method-dependence of single cell and microcircuitry data and provides tools for entering and retrieving published data without a priori interpretation or summarizing. Data representation is based on the framework suggested by biophysical theory and enables flexible combinations of data on membrane conductances, ionic and synaptic current…

Computer sciencecomputer.internet_protocolRelational databaseModels NeurologicalAction PotentialsInformation Storage and Retrievalcomputer.software_genreMachine learningExternal Data RepresentationData retrievalAnimalsComputer SimulationLayer (object-oriented design)NeuronsDatabasebusiness.industryGeneral NeuroscienceExperimental dataRatsData sharingScalabilityDatabase Management SystemsArtificial intelligenceNeural Networks ComputerNerve NetbusinesscomputerXMLJournal of neuroscience methods

researchProduct